Course goals
- Understand the data structure of microbiome studies
- Demonstrate how to process microbiome data into a usable format
- Explore processed microbiome data
- Test hypotheses
- Write a paper putting results into context of other research
Course requirements
- Laptop with R and R Studio installed
- Previous experience with R programming
- data cleaning / transformations
- plotting with ggplot
- file management
- model training and evaluation
Course setup
- We have a real data set and some question to address:
- Pando foliar fungi
- Are trees selecting fungal endophyte or epiphyte community?
- Do foliar fungi follow stochastic assembly?
- What spatial structure is there?
Hypotheses:
- We will use this data set to learn how to process and analyze microbiome data
- We will then write a paper as a class to present our findings for peer-review
Image from Wiki commons
“Pando is believed to be the largest, most dense organism ever found at nearly 13 million pounds. The clone spreads over 106 acres, consisting of over 40,000 individual trees. The exact age of the clone and its root system is difficult to calculate, but it is estimated to have started at the end of the last ice age. Some of the trees are over 130 years old. It was first recognized by researchers in the 1970s and more recently proven by geneticists. Its massive size, weight, and prehistoric age have caused worldwide fame.”
We are using Pando as a natural laboratory to study the biogeography and assembly of fungi associated with the leaves of plants. These fungi, known as endophytes (inside the leaves) and epiphytes (on surface of leaves) are important components of the plant microbiome. They modify plant disease severity, alter plant phenotype, and can even help plants resist common stressors such as drought. One important question regards where plants get their foliar fungi. In grasses, they are passed along inside seeds (vertical transmission), but in dicot plants like Pando, they are assembled from the environment.
They blow in on the wind, or are carried by animals, or by raindrops. But it is interesting that not all fungi can have a healthy stable relationship with all plants. There is likely some environmental filtering going on… the abiotic environmental variables or the plant itself are possibly doing some selecting of which fungi make it into plant leaves.
Most studies of this nature have to contend with the fact that when you go out into the field, each plant you sample has a different genotype, even if you sample the same plant species. We don’t have that problem in this study since every tree that’s part of Pando is just a piece of the same genetic clone individual.
What this means is that we can sample leaves from Pando and can study the spatial structure of the fungal communities without worrying about plant genotype.
Geoff Zahn, Josh Leon, Austen Miller inside Pando clone
In September 2023, we collected leaf samples from all over the Pando clone and extracted DNA from both the leaf surfaces and interiors. Those samples were sequenced on an Illumina MiSeq system (2x300bp). It’s there that we start our work. We will use bioinformatics tools to turn the raw DNA reads into fungal community data that we can work with and explore. We will test hypotheses about environmental filtering of foliar fungi.
Some questions we can ask:
- Are foliar fungi reflective of their immediate environment? i.e., Do samples that are physically close to each other share more similar fungal communities than can be expected by chance?
- Are there any “edge effects” in our communities? Are samples from the edge of the clone more similar to each other than those deeper inside the forest patch?
- Epiphytes might ‘just be there’ but endophytes are living inside the plant tissue. So do we see contrasting patterns between the two categories of fungi? If so, this could be evidence for environmental filtering of endophytes. And if there’s no geographic structure in endophytes, it could suggest that the plants themselves are doing the filtering.
You will have lots of readings in this course.
We will use a shared Zotero library to keep track of all our papers
Start by finding and reading these 2 papers:
Darcy, J. L., Swift, S. O. I., Cobian, G. M., Zahn, G. L., Perry, B. A., & Amend, A. S. (2020). Fungal communities living within leaves of native Hawaiian dicots are structured by landscape-scale variables as well as by host plants. Molecular Ecology, 29(16), 3102–3115. https://doi.org/10.1111/mec.15544
Zahn, G. (2022). Marker Genes (16S and ITS) Protocol for Plant Microbiome Analyses. BIO-PROTOCOL, 12(8). https://doi.org/10.21769/BioProtoc.4395
Here’s an overview of the different sites where we sampled
(Waiting for metadata spreadsheet from Austen)
Brief methods
Study sites
Sample processing
Data analysis
Here’s where we come in. We’ve got raw ITS sequence data from this study and need to process it, explore and visualize it, and test hypotheses. Our final codebase will be deposited as part of the publication, along with any figures and statistical results we develop.
Logistics
We will use R (and some Bash) along with Git/GitHub to conduct all of our work
Beyond the nitty gritty of coding, we will also be learning a lot about community ecology.
Some potential packages we will learn:
Here’s an example code archive for this type of work: Workshop Repository.
Here’s a BioProtocols paper walking through the workshop code: 16S Recipe –You’ll need to create a free account to download it
Expectations and evaluation
Grades will be based on assignments and code contributions
Assignments
During the semester, several assignments will be given related to the course material. Examples include:
- Looking up and reporting on alternative parameters for certain functions
- Finding and presenting papers about relevant topics
- Coding assignments such as novel figure generation
- Annotated bibliographies on background and discussion topics
- In-class participation in discussion and hypothesis generation
Code contributions
Each student is expected to contribute to our final codebase. Comment lines denoting code authorship will be included in the final paper.
Writing
Each student is expected to contribute to writing, background reading/research/references, and editing. Students with low participation will not earn authorship on our paper, but grades will not be based on writing.
Working topics (subject to revision):
- What is meta-amplicon technology?
- The Earth Microbiome Project
- Basics of community ecology
- Who is there?
- What are they doing?
- How do they interact with each other?
- How does the environment shape community structure?
- Community assembly
- Distributional ecology
- Analytical methods in community ecology
- Normalization / rarefaction
- Alpha, beta, gamma diversity
- Mantel / MRM / PermANOVA / Ordination / Networks
- Differential tests
- Technological methods and limitations
Weekly tasks and assignments
Assignment 1
- Read a paper several times and compile questions
Assignment 2
- Literature/resource search and annotation
Assignment 3
- TBD
Assignment 4
- TBD
Assignment 5
- TBD
Assignment 6
- TBD
Assignment 7
- TBD
Assignment 8
- TBD
Assignment 9
- TBD
Assignment 10
- TBD